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INTRODUCTION 

Intron  F  of  the  P120  gene  belongs  to  the  minor  class  of  non-conventional  introns,  which  have 
non-canonical  splice  site  and  branch  site  sequences.  Co-transfection  experiments  in  cultured  CHO 
ceUs  indicated  that  splicing  of  this  non-conventional  intron  required  U12  snRNA,  which  is  not 
required  for  conventional  intron  splicing  (Hall  and  Padgett,  1995  Cold  Spring  Harbor  Laboratory 
RNA  Processing  Meeting,  published  in  1996).  I  previously  proposed  to  study  the  biochemical 
mechanism  of  this  novel  class  of  intron  splicing  (August  1995).  At  the  time  the  fellowsWp  was 
granted  (December  1996),  however,  significant  progress  had  already  been  achieved  in  this  field. 
In  vitro  studies  in  Hela  nuclear  extract  revealed  that  compared  to  conventional  intron  splicing,  non- 
conventional  intron  splicing  required  common  (U5)  and  unique  (Ull,  U12,  U4atac  and  U6atac) 
snRNPs  (Tam  and  Steitz,  1996a,  1996b;  Yuo  and  Steitz,  1997).  Ull,  U12,  U4atac  and  U6atac 
functioned  in  the  non-conventional  spliceosome  as  analogues  of  Ul,  U2,  U4  and  U6,  respectively, 
in  the  conventional  spliceosome  (Hall  and  Padgett,  1996;  Tam  and  Steitz,  1996a,  1996b; 
Kolossova  and  Padgett,  1997;  Yuo  and  Steitz,  1997).  SnRNPs  Ul,  U2,  U4  and  U6  were  shown 
not  to  be  required  for  &e  splicing  of  non-conventional  introns  and  are  in  fact  not  present  in  the 
non-conventional  spliceosome.  In  some  cases,  depletion  of  Ul  or  U2  snRNA  from  the  extract  was 
necessary  for  detecting  the  splicing  of  a  non-conventional  intron  (Tam  and  Steitz,  1996).  An 
interesting  discovery  was  made  in  my  mentor's  lab,  i.e.,  that  there  is  cross-talk  between  adjacent 
conventional  and  non-conventional  introns  of  the  same  gene  (Wu  and  Krainer,  1996).  In  the  in 
vitro  splicing  system,  splicing  of  the  non-conventional  intron  2  of  the  sodium  channel  gene  (which 
has  the  same  elements  present  in  the  P120  intron  F  but  splices  more  efficiently  in  vitro)  was  greatly 
stimulated  if  the  downstream  exon  3  was  followed  by  the  5'  splice  site  from  the  conventional 
intron  3.  This  stimulation  was  dependent  on  Ul  snRNP,  an  snRNP  only  required  for  conventional 
intron  splicing  (Wu  and  Krainer,  1996).  The  cross-talk  between  these  two  distinct  intron  splicing 
pathways  is  likely  a  reflection  of  exon  definition  (Robberson  et  al.,  1990)  and  is  most  probably 
mediated  by  exon  splicing  enhancers  (ESEs),  as  previously  reported  for  adjacent  conventional 
introns  (see  below).  Recent  unpublished  experiments  from  our  lab  have  indeed  shown  that  purine- 
rich  exonic  splicing  enhancers  can  activate  splicing  of  non-conventional  introns.  Therefore,  I 
concentrated  my  effort  in  the  past  year  to  understand  the  molecular  basis  of  exonic  Splicing 
enhancer  function  and  their  specific  recognition  by  trans-acting  factors,  the  SR  proteins. 

Three  major  aspects  of  the  exon  sequence  have  been  shown  to  be  relevant  to  splice-site 
selection:  (i)  the  strength  of  the  splice  sites  (Bmnak  and  Engelbrecht,  1991);  (ii)  the  len^  of  the 
exon  (Dominski  and  Kole,  1991;  Sterner  and  Berget,  1993);  and  (iii)  positive  and  negative  exon 
cw-elements  (Lavigueur  et  al.,  1993;  Sun  et  al.,  1993;  Tian  and  Maniatis,  1993;  Watakabe  et  al., 
1993;  Xu  et  al.,  1993;  Caputi  et  al.,  1994;  Dirksen  et  al.,  1994;  Tanaka  et  al.,  1994;  Tian  and 
Maniatis,  1994;  Tsukahara  et  al.,  1994;  Amendt  et  al.,  1995;  Humphrey  et  al.,  1995; 
Ramchatesingh  et  al.,  1995;  Staffa  and  Cochrane,  1995;  Gontarek  and  Derse,  1996;  Zheng  et  al., 
1996).  The  positive  exon  cw-elements,  known  as  exon  splicing  enhancers  (ESEs),  are  often, 
though  not  always,  found  in  a  purine-rich  context.  A  well-studied  example  is  the  ESE  in  the 
alternative  exon  M2  of  the  mouse  IgM  gene.  This  73  nt  ESE  is  essential  for  splicing  of  the 
preceding  intron  between  exons  Ml  and  M2.  The  M2  ESE  can  also  stimulate  splicing  of  the 
heterologous  regulated  intron  of  the  Drosophila  doublesex  gene.  Enhancer  activity  in  the  context  of 
the  IgM  pre-mRNA  could  also  be  obtained  by  inserting  certain  natural  or  synthetic  purine-rich 
sequences  in  place  of  the  natural  ESE.  However,  deletion  of  the  purine-rich  sequences  within  the 
M2  ESE  did  not  abolish  its  activity  completely  (Watakabe  et  al.,  1993;  Tanaka  et  al.,  1994).  In 
agreement  with  this  finding,  SELEX  experiments  revealed  that  certain  non-purine-rich  sequences 
can  also  function  as  enhancers  (Tian  and  Kole,  1995;  Coulter  et  al.,  1997).  Most  natural  ESEs 
have  been  identified  in  tissue-specific  or  developmentally  regulated  exons,  which  typically  have 
weak  splice  sites  and  require  Ae  ESE  for  exon  inclusion,  fii  some  cases  ESEs  are  specifically 
recognized  by  one  or  more  SR  proteins  (Lavigueur  et  al.,  1993;  Sun  et  al.,  1993;  Tian  and 
Maniatis,  1993;  Tian  and  Maniatis,  1994;  Ramchatesingh  et  al.,  1995;  Gontarek  and  Derse,  1996). 
In  turn,  SR  proteins  are  expressed  at  different  levels  in  different  tissues,  and  their  expression  also 
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appears  to  be  regulated  by  alternative  splicing  (Jumaa  et  al.,  1997;  for  review,  see  Caceres  and 
Krainer,  1997). 

The  SR  proteins  are  a  family  of  highly  conserved  serine/arginine-rich  RNA-brnding  proteins. 
They  are  essential  splicing  factors  (Krainer  et  al.,  1990b;  Ge  et  al.,  1991;  Krainer  et  al.,  1991; 
Zahler  et  al.,  1992)  and  also  regulate  the  selection  of  alternative  splice  sites  in  a  concentration- 
dependent  manner  (Ge  and  Manley,  1990;  Krainer  et  al.,  1990a;  Zahler  et  al.,  1993a),  in  part  by 
antagonizing  the  activity  of  hnRNP  Al  (Mayeda  and  Krainer,  1992).  The  SR  proteins  act  very 
early  in  spliceosome  assembly  (Krainer  et  al.,  1990a;  Fu,  1993;  Staknis  and  Reed,  1994).  "piey 
promote  the  binding  of  U1  snRNP  to  the  5'  splice  site  (Eperon  et  al.,  1993;  Wu  and  Mamatis, 
1993;  Kohtz  et  al.,  1994;  Staknis  and  Reed,  1994;  Zahler  and  Roth,  1995)  and  of  U2AF65  to  Ae 
3'  splice  site  (Wu  and  Maniatis,  1993),  apparently  by  interacting  with  U1  70K  and  U2AF35, 
respectively.  These  observations  have  led  to  the  hypothesis  that  SR  proteins  bound  to  ESEs  ^ruit 
splicing  factors  to  bind  to  the  splice  sites  of  adjacent  introns  (Wu  and  Maniatis,  1993;  Staknis  and 
Reed,  1994). 

Nine  human  SR  proteins  are  presently  known:  SF2/ASF,  SC35,  SRp20,  SRp40,  SRp75, 
SRp55, 9G8,  SRp30c,  and  the  somewhat  more  divergent  p54.  These  proteins  are  closely  related  in 
primary  stmcture  and  share  the  ability  to  complement  splicing  in  a  HeLa  cell  SI 00  extract  (Ge  et 
al.,  1991;  Krainer  et  al.,  1991;  Fu  et  al.,  1992;  Zahler  et  al.,  1992;  Zahler  et  al.,  1993b;  Cavaloc  et 
al.,  1994;  Screaton  et  al.,  1995;  Zhang  and  Wu,  1996).  SR  proteins  appear  to  have  partially 
redundant  functions,  such  that  several  different  members  of  the  family  can  complement  an  S 100 
extract  to  splice  the  same  pre-mRNA,  and/or  stimulate  use  of  the  same  alternative  5’  splice  sites  in 
vitro  or  in  vivo  (Fu  et  al.,  1992;  Zahler  et  al.,  1992).  However,  substrate-specific  differences  in 
general  splicing,  enhancer-dependent  splicing,  or  alternative  splicing  mediated  by  Afferent  SR 
proteins  have  also  been  reported  (Fu,  1993;  Sun  et  al.,  1993;  Ziahler  et  al.,  1993a;  Caceres  et  al., 
1994;  Wang  and  Manley,  1995).  More  importantly,  Drosophila  SRp55/B52  has  been  shown  to  be 
essential  for  development  (Ring  and  Lis,  1994;  Peng  and  Mount,  1995),  whereas  at  least  one  copy 
of  the  chicken  SF2/ASF  gene  is  required  for  survival  of  a  B-lymphocyte  cell  line  (Wang  et  al., 
1996).  Individual  SR  proteins  also  differ  in  their  subnuclear  localization  signals  and  in  their  ability 
to  shuttle  between  the  nucleus  and  the  cytoplasm  (Caceres  et  al.,  1997a;  Caceres  et  al.,  1997b). 
Finally,  individual  SR  proteins  exhibit  striking  phylogenetic  sequence  conservation  of  aU  then- 
constituent  domains  ^imey  et  al.,  1993).  Taken  together,  these  observations  demonstrate  that 
individual  SR  proteins  have  some  unique,  specific  functions. 

Although  SR  proteins  have  been  clearly  implicated  in  ESE  recognition  and  function,  predictive 
rules  for  the  recognition  of  ESEs  by  different  SR  proteins  have  not  been  derived.  In  this  study,  I 
sought  to  determine  the  specificity  of  individual  SR  proteins  in  ESE  recognition  by  performmg  a 
randomization  and  selection  procedure  under  splicing  conditions. 
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BODY 

1.  Experimental  procedures 

Preparation  of  HeLa  cell  extracts  and  recombinant  SR  proteins 

Nuclear  and  cytosolic  SI 00  extracts  were  prepared  from  fresh  12  1  suspension  cultures  of 
HeLa  cells,  as  described  (Mayeda  and  Krainer,  1997b). 

Expression  and  purification  of  the  authentic  form  of  the  recombinant  SR  proteins  SF2/ASF, 
SRp40  and  SRp55,  using  the  expression  vector  pET9c  (Novagen),  were  carried  out  as  described 
previously  (Krainer  et  al.,  1991;  Screaton  et  al.,  1995).  The  integrity  and  purity  of  these 
recombinant  SR  proteins  were  checked  by  SDS-PAGE  and  their  specific  activities  were  determined 
by  in  vitro  splicing  of  P-globin  pre-mRNA  in  SlOO  extract  (data  not  shown). 

Randomization  and  selection 

The  SELEX  procedure  is  outlined  in  Figure  1  A.  The  sequence  of  the  wild  type  IgM  exon  M2 
is  shown  in  Figure  IB.  The  plasmids  pMl-2  and  pMA,  which  bear  a  mouse  IgM  minigene  with  or 
without  the  natural  enhancer,  respectively  (Watakabe  et  al.,  1993),  were  a  generous  gift  from  Prof. 
Y.  Shimura.  The  randomized  substrate  pool  was  constructed  by  overlap-extension  PCR  (Horton  et 
al.,  1989;  Tian  and  Kole,  1995).  Two  sets  of  PCRs  were  performed  using  |xMA  as  template.  The 
first  PCR  was  carried  out  with  primers  R  and  A.  The  second  PCR  used  primers  P  and  B.  The 
products  from  the  two  reactions  were  then  combined  and  further  amplified  using  primers  P  and  A. 
The  resulting  PCR  product  was  then  used  for  in  vitro  transcription  with  T7  RNA  polymerase  to 
generate  a  radiolabeled  pre-mRNA  substrate  pool.  The  pool  of  spliced  mRNAs  generated  by  in 
vitro  splicing  was  excised  from  a  urea-polyacrylamide  gel,  eluted  in  0.5  M  ammonium  acetate  plus 
0.1%  SDS,  reverse  transcribed  using  Superscript  11  RT  (GIBCO-BRL)  and  amphfied  by  PCR 
using  primers  P  and  A.  The  amplified  product  was  further  amplified  using  primers  S  and  A.  The 
PCR  product  was  purified  on  a  2%  agarose  gel  and  re-assembled  into  the  pre-mRNA  template  by 
overlap-extension  PCR  for  the  next  round  of  selection.  The  reverse  transcription  and  PCR 
reactions  were  performed  as  suggested  by  the  vendors  (GIBCO-BRL  and  Strategene, 
respectively).  All  the  PCRs  were  done  using  the  high  fidelity  Pfu  DNA  polymerase.  The  primers 
were  purchased  from  Operon  and  were  used  at  a  concentration  of  100  p.M.  After  three  rounds  of 
selection,  the  amplified  spliced  products  were  subcloned  into  the  vector  PCR-Blunt  (Strategene) 
and  sequenced  using  a  Dye  Terminator  Cycle  Sequencing  kit  and  an  automated  ABI  377  sequencer 
(Perkin-Elmer).  The  second  and  third  rounds  of  SELEX  were  performed  in  nuclear  extract 
depleted  of  total  SR  proteins  by  Mg^^  precipitation  (Blencowe  et  al.,  1994).  The  sequences  of  the 
primers  were  as  follows: 

Primer  P:  5'-ATTTAGGTGACACTATAGAATAC-3' 

Primer  A:  5'-GCAGGTCGACTCTAGAAAGAAG-3' 

Primer  S;  5'-GTGAAATGACTCTCAGCAT-3' 

Primer  B:  5’-ATGCTGAGAGTCATTTCAC-3’ 

Primer  R,  5’-GTGAAATGACTCTCAGCAT(N)2oCTAGTAAACTTATTCTTACGTC-3' 

Identifrcation  of  consensus  motifs  among  the  selected  sequences 

Functional  selected  sequences  for  each  SR  protein  were  ahgned  using  Gibbs  sampler 
(Lawrence  et  al.,  1993),  with  the  assumption  that  there  is  a  common  sequence  motif  of  length  L 
present  at  least  once  in  all  of  the  sequences.  Since  Gibbs  sampler  is  a  stochastic  algorithm,  for  each 
fixed  L,  at  least  ten  different  runs  (with  different  random  seeds)  were  carried  out  for  times 
sufficient  to  achieve  convergence.  A  conservative  value  for  L  was  determined  empirically  by 
observing  a  sharp  drop  in  information  per  parameter  (Lawrence  et  al.,  1993)  as  L  was  increased. 
To  exclude  the  possibihty  that  the  predicted  consensus  motif  arose  by  chance,  the  information  per 
parameter  was  also  compared  to  alignments  of  random  sequences  obtained  by  shuffling  the 
nucleotides  within  each  sequence.  The  final  alignment  was  manually  adjusted  in  a  few  cases,  when 
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better  matches  to  the  consensus  could  be  obtained  by  including  a  few  flanking  nucleotides  in  the 
alignment. 

Construction  of  a  scoring  matrix 

First,  a  frequency  matrix  fj  (a)  was  calculated  from  the  alignment  (i  is  the  position  of 
nucleotide  a).  Given  a  background  frequency  for  the  set  of  sequences,  p(a),  the  scoring  matrix  is 
defined  by  the  following  formula: 

f.  (a)  +  e  p(a) 


s.  (a)  =  log2 

p(a)(l+e) 

where  i  =  {1,  2,  ...,  L) ,  a  =  {A,  C,  G,  U),  and  e  =  0.5  is  the  Bayesian  prior  parameter 
(Lawrence  et  al.,  1993). 

A  motif  score  is  equal  to  the  sum  of  the  scores  at  each  position.  Motifs  may  be  ranked  by  their 
scores.  The  top  three  scores  in  each  sequence  using  all  three  different  scoring  matrices  (SF2/ASF, 
SRp40,  and  SRp55)  were  calculated  and  tabulated  (data  not  shown). 

The  sequence-scores  were  consistent  semi-quantitatively  with  the  gel  intensity  data  when  the 
sequence-scores  for  a  given  SR  protein  were  defined  as  follows: 

(a)  The  maYimnm  score  for  each  selected  sequence  was  calculated  using  the  scoring  matrix, 
and  the  threshold  was  defined  as  the  minimum  of  these  scores. 

(b)  The  sequence-score  was  defined  as  the  number  of  non-overlapping  motifs  that  have  a 
score  greater  than  or  equal  to  the  threshold.  This  integer  score  correlated  well  with  the 
corresponding  gel  intensity. 

It  should  be  noted  that  a  motif  scoring  matrix  may  depend  on  the  pre-mRNA  substrate  and  on 
the  experimental  parameters,  such  as  the  concentration  of  SR  protein. 

In  vitro  splicing 

In  vitro  splicing  was  performed  as  described  previously  (Mayeda  and  Krainer,  1997a). 
Briefly,  20  finol  of  52p-iabeled,  ^CHsQpppG-capped  SP6  or  T7  transcripts  generated  from  PCR 
products  were  incubated  in  25  pi  splicing  reactions.  The  reactions  contained  4  p.1  of  HeLa  nuclear 
extract  or  7  jxl  of  S 100  extract  in  buffer  D.  The  MgCl2  concentration  was  4.8  mM.  20  pmol  of  the 
appropriate  SR  protein  was  used  in  SlOO  complementation  assays.  After  incubation  at  30°C  for  4 
hours,  the  RNA  was  extracted  and  analyzed  on  5.5%  polyacrylamide  denaturing  gels,  followed  by 
autoradiography. 

UV-crosslinking 

UV-crosslinking  experiments  were  carried  out  under  splicing  conditions  with  or  without  a  5- 
50  fold  molar  excess  of  unlabeled  RNA  competitor.  Polyvinyl  alcohol  was  omitted  from  the 
crosslinking  reactions.  After  a  30  min  incubation  at  30  °C  the  reactions  were  exposed  to  254  nm 
UV  light  using  a  Spectronics  XL- 1000  UV  crosslinker  at  a  setting  of  1.8  I/cm?  on  ice.  10  p,g  of 
RNase  A  and  100  units  of  RNase  T1  was  added  and  the  reactions  were  incubated  for  15  min  at  37 
°C.  The  crosslinked  proteins  were  analyzed  by  SDS-PAGE  on  a  12%  gel,  followed  by 
autoradiography. 

Immunoprecipitations 

hnmunoprecipitations  were  performed  as  described  (Sun  et  al.,  1993).  The  anti-SF2/ASF 
monoclonal  antibody  recognizes  the  N-terminus  of  SF2/ASF  and  does  not  crossreact  with  other 
human  SR  proteins  (A.R.  Krainer,  unpublished).  Polyclonal  antisemm  against  SRp40  (anti- 
HRS/SRp40)  was  a  generous  gift  from  Drs.  K.  Du  and  R.  Taub  (Du  et  al.,  1997).  The 
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crosslinking  reactions  were  pre-cleared  after  incubation  with  control  antibodies  and  50  jxl  of  protein 
A-agarose  (1:1  suspension)  in  500  pi  of  IP  buffer  (50  mM  Tris-HCl,  pH  8.0,  150  mM  NaCl, 
0.05%  NP-40)  for  2  hours  at  4  °C.  An  unrelated  monoclonal  antibody  of  the  same  isotype  was 
used  for  the  SF2/ASF  pre-clearing  step  and  rabbit  pre-immune  serum  was  used  for  the  SRp40  pre¬ 
clearing  step.  After  spirming  in  a  microcentrifuge  for  30  min  at  4  °C,  the  supernatants  were 
transferred  to  tubes  containing  the  appropriate  antibody  immobilized  on  protein  A-agarose  and 
rocked  overnight  at  4  °C.  The  bound  material  was  recovered  by  centrifugation,  washed  twice  with 
1  ml  of  IP  buffer,  eluted  in  30  pi  of  sample  buffer  (62.5  mM  Tris-HCl,  pH  6.8,  2%  (w/v)  SDS, 
10%  (v/v)  glycerol,  5%  (v/v)  2-mercaptoethanol),  and  analyzed  by  SDS-PAGE  and 
autoradiography. 

2.  Results 

Identification  of  SR  protein  target  sequences  from  a  random  pool  under  splicing 
conditions 

To  find  specific  target  sequences  reeognized  by  individual  SR  proteins  under  spUcing 
conditions,  a  procedure  based  on  SELEX  (Tuerk  and  Gold,  1990)  was  utilized  imposing  a 
seleetion  for  splicing  (Tian  and  Kole,  1995;  Coulter  et  al.,  1997),  rather  than  for  binding 
(Heinrichs  and  Baker,  1995;  Tacke  and  Manley,  1995;  Shi  et  al.,  1997;  Tacke  et  al.,  1997).  I 
further  modified  the  procedure  by  earrying  out  the  splieing  reactions  in  the  presence  of  a  single, 
recombinant  SR  protein,  whieh  was  used  to  complement  HeLa  extracts  deficient  for  SR  proteins  - 
either  an  SlOO  extract  or  an  SR  protein-depleted  nuclear  extract.  I  chose  to  perform  the  selection 
for  ESEs  in  the  context  of  a  well  characterized  IgM  minigene  transeript,  comprising  the  last  intron 
flanked  by  the  Ml  and  M2  membrane  isoform-specific  exons.  A  prototypical  ESE  was  previously 
mapped  to  a  73  nt  fragment  of  exon  M2  (Watakabe  et  al.,  1993).  This  ESE  was  found  to  be 
essential  for  IgM  pre-mRNA  splicing  in  nuclear  or  SlOO  extract  (Watakabe  et  al.,  1993;  our 
unpublished  data).  The  scheme  for  the  randomization  and  selection  procedure  is  outlined  in  Figure 
1 A  (see  Materials  and  Methods  for  details).  First,  the  natural  ESE  in  the  M2  exon  was  replaced  by 
20  nt  of  random  sequence.  A  library  of  pre-mRNAs  representing  1.2  x  lO^O  different  sequences 
was  spheed  in  SlOO  extract  complemented  by  either  recombinant  SF2/ASF,  SRp40,  or  SRp55. 
The  spheed  mRNAs,  now  carrying  functional  ESEs,  were  recovered  from  denaturing 
polyacrylamide  gels.  The  randomized  region  of  exon  M2  of  the  spheed  mRNAs  was  then  amplified 
by  RT-PCR  and  re-assembled  into  a  new  pool  of  pre-mRNAs  for  further  selection.  Two  additional 
rounds  of  selection  were  carried  out  in  SR  protein-depleted  nuclear  extract  (Blencowe  et  al.,  1994) 
complemented  with  individual  SR  proteins,  in  order  to  mimic  the  eonditions  of  nuclear  extract,  to 
minimize  possible  biases  specific  to  the  S 100  extraet,  and  to  select  the  most  efficient  ESEs. 

After  three  rounds  of  selection,  the  spheed  mRNAs  were  amphfied  by  RT-PCR,  subeloned 
and  sequenced.  Twenty-four  or  more  independent  sequences  obtained  with  each  SR  protein  were 
analyzed  to  determine  a  consensus  sequence,  using  the  program  GIBBS  sampler  (Lawrence  et  al., 
1993).  The  defined  motifs  were  used  to  generate  a  score  matrix,  according  to  the  frequency  of  each 
nucleotide  at  each  position.  These  score  matrices  were  used  to  seareh  the  high-score  motifs  in  each 
winner  sequence.  Smah  portions  of  the  constant  flanking  regions  (18  nt  of  the  5'  region  and  20  nt 
of  the  3'  region)  were  included  during  the  search.  The  resulting  alignments  of  sequences  selected 
with  SF2/ASF,  SRp40,  or  SRp55  are  shown  in  Figures  2A,  3  and  4,  respectively.  As  a  control, 
30  sequences  from  the  initial  random  RNA  pool  are  shown  in  Figure  2B. 

The  consensus  sequences  derived  for  each  of  the  three  SR  proteins  tested  differed  in  both 
length  and  sequence.  Each  of  the  consensus  sequences  is  relatively  degenerate,  and  not  all  of  the 
individual  selected  sequences  match  the  consensus  at  every  position.  However,  many  of  the 
individual  sequences  have  more  than  one  good  mateh  to  the  consensus,  allowing  for  one  or  two 
mismatches. 

The  SF2/ASF  winners  gave  the  consensus  sequence  SRSASGA  (S  represents  G  or  C,  R 
represents  purine),  which  only  in  some  cases  eorresponds  to  a  purine-rich  motif.  The  content  of  U 
residues  in  the  SF2/ASF  winner  pool  was  16%,  which  represents  a  signifieant  reduction  from  the 
21%  of  U  residues  found  in  the  initial  random  pool.  This  reduction  can  be  accounted  for  by  the 
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absence  of  U  residues  from  the  consensus  motif.  The  content  of  C  residues  increased  by  4%.  The 
frequency  of  A  and  G  did  not  vary  significantly  upon  selection  (Figure  2,  A  and  B).  SF2/ASF  was 
previously  shown  to  recognize  purine-rich  sequences  in  SELEX  procedures  based  on  binding;  the 
reported  sequences,  RGAAGAAC  and  AGGACRRAGC  (Tacke  and  Manley,  1995),  are 
significantly  different  from,  and  simpler  than,  the  consensus  motif  I  found.  Similar  experiments, 
performed  independently  in  our  lab,  revealed  a  different  purine-rich  consensus  sequence, 
GARGAGC  (A.  Hanamura,  I.  Watakabe,  and  A.R.  Krainer,  unpublished  data).  In  the  present 
study,  only  13  out  of  28  winners  have  uninterrupted  purine-rich  motifs  longer  than  5  nt,  indicating 
that  SF2/ASF  can  productively  recognize  a  far  broader  range  of  sequences.  Indeed,  the  overall 
purine  composition  of  the  SF2/ASF-selected  pool  did  not  change  significantly  from  that  of  the 
initial  random  pool. 

The  consensus  for  the  SRp40-selected  sequences  is  ACDGS  (D  represents  residues  other  than 
C;  S  represents  G  or  C).  This  consensus  is  also  very  different  from  that  previously  determined  as 
an  optimal  RNA-binding  site  for  SRp40,  TGGGAGCRGTYRGCTCGY  (Tacke  et  al.,  1997).  The 
content  of  G  residues  in  the  SRp40  winner  RNA  pool  decreased  from  39%  in  the  initial  random 
pool  to  34%.  The  content  of  C  residues  increased  by  5%.  The  frequency  of  A  and  U  did  not 
change  significantly  (Figure  3, 2B).  The  consensus  motif  for  the  SRp40  winners  is  relatively  short 
but  has  a  sufficient  information  content,  such  that,  for  example,  it  does  not  occur  by  chance  in 
most  of  the  RNAs  sampled  from  the  initial  random  pool.  Winners  SRp40-l,  SRp40-2,  SRp40-3, 
and  SRp40-4,  for  example,  are  clearly  not  derived  from  a  single  founder  sequence  by  accumulated 
mutations  during  PCR.  However,  they  all  share  the  sequence  ACGGC,  which  matches  the 
consensus,  and  is  the  only  common  sequence  among  these  winners.  Similar  sequence  relationships 
are  seem  among  winners  SRp40-5,  SRp40-6,  SRp40-7;  SRp40-8,  SRp40-9,  SRp40-10;  SRp40- 
11,  SRp40-12;  SRp40-13,  SRp40-14;  SRp40-15,  SRp40-16. 

The  SRp55  winners  yielded  the  consensus  sequence  USCGKM  (S  represents  G  or  C;  K 
represents  U  or  G;  M  represents  A  or  C).  The  C  residue  content  in  the  SRp55  winner  pool 
increased  significantly,  from  19%  in  the  starting  pool  to  26%.  The  content  of  G  and  U  residues 
decreased  by  5%  and  4%,  respectively  (Figure  4  and  2B).  B52,  which  appears  to  be  the 
Drosophila  orthologue  of  human  SRp55,  was  reported  to  have  GRUCAACCNGGCGACNG  as 
the  optimal  binding  site  (Shi  et  al.,  1997).  In  that  report,  it  was  also  suggested  that  a  hairpin 
structure  was  required  for  efficient  B52  binding.  In  contrast,  I  did  not  observe  common  secondary 
structure  elements  in  our  human  SRp55  winner  sequences. 

The  same  sequence  analysis  programs  were  used  to  search  the  initial  random  pool,  but  no 
stable  pattern  was  found.  Each  of  the  consensus  sequences  was  used  to  create  a  score  matrix. 
These  score  matrices  were  then  used  to  search  all  three  of  the  winner  pools  and  the  initial  random 
pool.  The  mean  score  of  the  corresponding  SR  protein-selected  winner  pool  was  always  higher 
than  that  of  the  other  three  pools  (data  not  shown). 

The  SELEX  winner  sequences  function  as  bona  fide  ESEs 

To  investigate  the  functional  importance  of  the  winner  sequences,  several  winners  were 
randomly  chosen  from  the  winner  pools  of  each  SR  protein.  Their  ability  to  function  as  enhancers 
was  tested  by  splicing  the  corresponding  pre-mRNAs  in  HeLa  nuclear  extract  or  in  SlOO  extract 
plus  specific  SR  proteins  (Figure  5). 

All  the  SF2/ASF-selected  sequences  promoted  efficient  splicing  in  nuclear  extract  (Figure  5  A, 
lanes  1, 4, 7, 10, 13, 16, 19,  and  22),  indicating  that  the  selected  sequences  could  function  as  true 
ESEs.  Furthermore,  these  ESE  sequences  promoted  splicing  in  SlOO  extract  plus  recombinant 
SF2/ASF  (Figure  5A,  lanes  3,  6,  9,  12,  15,  18,  21,  and  24),  but  not  in  SlOO  extract  only  (Figure 
5A,  lanes  2,  5,  8,  11,  14,  17,  20,  23).  The  splicing  efficiency  in  SlOO  extract  plus  SF2/ASF  was 
lower  than  that  of  the  nuclear  extract.  Winner  sequences  comprising  either  purine-rich  (Bl,  B3, 
B4,  B5  and  B7)  or  non-purine-rich  motifs  (B2,  B6,  and  B8)  resulted  in  comparable  splicing 
efficiencies. 

A  20-40%  ammonium  sulfate  cut  of  nuclear  extract  (NF20/40)  was  previously  shown  to  be 
required  for  the  function  of  synthetic  ESEs  (selected  by  a  binding  protocol)  in  SlOO  extract  plus  the 
appropriate  SR  protein  (Tacke  and  Manley,  1995;  Tacke  et  al.,  1997).  In  contrast,  the  enhancers  I 
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selected  by  function  were  able  to  function  in  SlOO  extract  plus  the  appropriate  SR  protein  without 
further  additions  (Figure  5A).  Supplementation  with  an  NF20/40  fraction  did  not  significantly 
improve  the  splicing  efficiency  (data  not  shown). 

Similar  results  were  obtained  in  splicing  assays  with  the  SRp40  winners.  All  of  the  7  winners 
tested  spliced  in  nuclear  extract  (Figure  5B,  lanes  1, 4, 7, 10, 13,  16,  and  19)  and  in  SlOO  extract 
plus  recombinant  SRp40  (Figure  5B,  lanes  3,  6,  9,  12,  15,  18,  and  21),  but  not  in  SlOO  extract 
only  (Figure  5B,  lanes  2,  5,  8, 11,  14, 17,  and  20).  The  splicing  efficiency  of  the  SRp40  winners 
in  nuclear  extract  was  lower  on  average  than  that  of  the  SF2/ASF  winners  (Figure  5A,  5B).  In 
SlOO  extract  alone,  several  of  the  pre-mRNAs  were  extensively  degraded  (Figure  5B,  constructs 
El,  E2,  E4  and  E7).  This  is  consistent  with  the  notion  that  SR  proteins  are  involved  in  assembly  of 
a  commitment  complex,  such  that  substrates  that  are  not  productively  assembled  into  commitment 
complexes  and  pre-spliceosomes  are  generally  more  susceptible  to  degradation  by  non-specific 
nucleases  in  the  extract. 

The  eight  tested  SRp55  winners  all  spliced  in  nuclear  extract  (Figure  5C,  lanes  1,  5,  9,  13, 
17,  21,  25,  and  29)  and  in  SlOO  extract  plus  SRp55  (Figure  5C,  lanes  4,  8,  12,  16,  20,  24,  28, 
and  32),  but  not  in  SlOO  extract  only  (Figure  5C,  lanes  2,  6,  10,  14,  18,  22,  26,  and  30). 
Interestingly,  four  of  the  eight  SRp55  winners  tested  spliced  more  efficiently  in  SlOO  extract  plus 
SRp55  than  in  nuclear  extract  alone  (Figure  5C,  Cl,  C4,  C6  and  Cl),  suggesting  that  SRp55  is  the 
only  SR  protein  required  for  effective  recognition  of  these  winner  sequences.  The  higher  splicing 
efficiency  of  Cl,  C4,  C6  and  Cl  pre-mRNAs  in  SlOO  compared  to  nuclear  extract  cannot  be 
accounted  for  by  their  increased  stability,  since  the  remaining  winners,  C2,  C3,  C5  and  Cl,  were 
also  greatly  stabilized  in  SlOO  extract  plus  SRp55,  but  their  splicing  efficiencies  were  lower  than  in 
the  nuclear  extract. 

I  tested  whether  the  short  consensus  motifs  are  sufficient  to  activate  splicing.  This  was  done 
by  replacing  sequences  within  template  A13  by  the  short  consensus  motifs  from  the  Bl,  B2,  C4, 
or  E7  winners  figure  2A)  and  then  testing  the  splicing  activity  of  the  corresponding  pre-mRNAs. 
A13  was  isolated  from  the  initial  random  pool  and  had  veiy  low  splicing  activity  in  nuclear  extract 
and  no  splicing  activity  in  SlOO  extract  complemented  with  any  of  the  three  SR  proteins  tested. 
Insertion  of  one  copy  of  the  consensus  motifs  was  sufficient  to  activate  splicing  of  the  modified 
A13  pie-mRNA  in  SlOO  extract  plus  the  cognate  SR  protein,  although  the  splicing  efficiencies 
were  very  low  (data  not  shown).  The  low  efficiency  suggests  Aat  the  sequence  context 
surrounding  the  conserved  motifs  is  also  important  for  splicing,  consistent  with  the  observation 
that  several  of  the  winner  sequences  contain  more  than  one  match  to  the  consensus.  I  also  made 
and  tested  a  number  of  clustered  point  mutations  in  several  of  the  ESEs  selected  by  each  SR 
protein.  I  was  unable  to  inactivate  ESE  function  either  by  mutations  in  the  best  match  to  the 
consensus  motif  or  by  mutations  on  either  side  of  the  motif  (data  not  shown).  This  unexp^ted 
observation  suggests  that  each  of  the  selected  sequences  has  a  high  level  of  internal  functional 
redundancy,  which  is  probably  necessary  to  allow  efficient  splicing. 

SR  protein  specificity  of  the  selected  ESEs 

The  fact  that  each  SR  protein  selected  ESEs  that  fit  a  different  consensus  sequence  suggests 
that  SR  proteins  recognize  the  synthetic  ESEs  in  a  sequence-specific  manner.  On  the  other  hand, 
the  observation  that  most  of  the  winner  sequences  promoted  more  efficient  splicing  in  nuclear 
extract  than  in  the  SlOO  complementation  reactions  suggests  that  SR  proteins  may  function 
cooperatively.  I  examined  these  possibilities  by  testing  the  effect  of  different  SR  proteins  on 
splicing  of  pre-mRNAs  with  each  type  of  winner.  These  experiments  were  performed  in  SlOO 
extract  complemented  with  individual  SR  proteins  or  with  pairwise  combinations  thereof. 
Strikingly,  the  three  kinds  of  SR  protein  winner  sequences  showed  very  different  specificities. 

The  SRp40-selected  winner  sequences  failed  to  splice  in  SlOO  extract  plus  SF2/ASF  (Figure 
6A,  lanes  1,  4,  7,  10,  13,  16  and  19).  However,  they  did  splice  in  SlOO  extract  plus  SRp55 
(Figure  6B,  lanes  1, 4, 7, 10,  13,  and  16).  Adding  two  SR  proteins  together  did  not  significantly 
increase  the  splicing  efficiency,  although  additive  effects  were  observed  with  two  of  the  winners, 
E4  and  E6,  using  SRp40  and  SRp55  (Figure  6A,  6B). 
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The  SF2/ASF-selected  winner  sequences  gave  a  different  result.  The  B2  winner  spliced  very 
poorly  in  the  presence  of  SlOO  extract  and  SRp55,  whereas  the  remaining  winners,  Bl,  B3,  B4, 
B5,  B6  and  B7,  failed  to  splice  under  these  conditions  (Table  1).  SRp40  did  not  activate  splicing 
of  any  of  the  SF2/ASF-selected  winners  tested.  Moreover,  SRp40  inhibited  splicing  of  the 
SF2/ASF-selected  winners  even  in  the  presence  of  SF2/ASF  (Table  1). 

The  SRp55  winner  Cl  spliced  in  SlOO  extract  plus  any  of  the  three  SR  proteins  I  examined. 
However,  addition  of  two  SR  proteins  did  not  increase  its  splicing  efficiency.  AH  the  6  other 
SRp55  winners  tested  failed  to  splice  in  SlOO  extract  plus  SF2/ASF  or  SRp40  (Table  1). 

SR  proteins  bind  specifically  to  the  ESEs  in  nuclear  extract 

I  have  selected  ESEs  that  respond  specifically  to  individual  SR  proteins  under  splicing 
conditions.  Because  the  selection  was  on  the  basis  of  function,  it  does  not  necessarily  follow  that 
the  SR  proteins  bind  directly  to  the  cognate  ESEs,  although  this  is  generally  thought  to  be  the  case 
for  at  least  some  natural  enhancers  (see  Discussion).  To  determine  if  SR  proteins  directly  contact 
the  novel  ESEs,  I  carried  out  UV-crosslinking  experiments  under  splicing  conditions  in  nuclear 
extract.  I  used  radiolabeled  RNA  fragments  comprising  the  M2  exon  with  the  different  ESEs.  20 
fmol  of  an  M2  exon  RNA  comprising  the  SF2/ASF-selected  winner  Bl  (referred  to  as  BIE)  was 
incubated  in  the  presence  of  excess  cold  exon  M2  RNA  competitors  with  different  ESEs.  The 
reaction  mixtures  were  then  irradiated  with  UV  light  on  ice,  digested  with  RNases  A  and  T1  and 
analyzed  by  SDS-PAGE.  BIE  crosslinked  specifically  to  a  34-kDa  polypeptide  (Figure  7A).  This 
crosslink  could  be  competed  by  cold  B  IE  RNA  (lanes  2  and  3),  but  not  by  exon  M2  RNAs  with  an 
SRp40-  or  an  SRp55-selected  ESE  (lanes  10  and  11,  and  lanes  4  and  5,  respectively).  Exon  M2 
RNAs  with  ESEs  selected  by  other  SR  proteins  (lanes  8, 9, 12  and  13)  or  with  sequences  from  the 
initial  random  pool  (lanes  6,  7)  also  failed  to  compete  with  BIE  for  crosslinking  to  the  34-kDa 
polypeptide. 

To  confirm  that  the  34-kDa  polypeptide  is  SF2/ASF,  I  carried  out  immunoprecipitations  after 
UV  crosslinking  and  RNase  digestion  (Figure  7B).  As  expected,  the  crosslinked  34-kDa 
polypeptide  was  immunoprecipitated  by  a  monoclonal  antibody  specific  for  SF2/ASF  (lane  2)  but 
not  by  a  control  monoclonal  antibody  (lane  3). 

■Similar  UV-crosslinking  experiments  were  attempted  using  SRp40-  and  SRp55-  selected 
ESEs.  Crosslinked  proteins  of  approximately  40  kDa  and  55  kDa  were  detected  using  radiolabeled 
exon  M2  RNAs  corresponding  to  the  SRp40  winner  E7  (E7E),  and  the  55-kDa  protein  also 
crosslinked  to  the  SRp55  winner  C4  (C4E),  although  the  background  was  high  (data  not  shown). 
Neither  of  these  RNAs  crosslinked  to  proteins  with  the  mobility  of  SF2/ASF.  Crosslinking  to  the 
55-kDa  protein  was  competed  by  an  excess  of  cold  C4E  RNA,  but  not  by  BIE  or  E7E. 
Immunoprecipitation  with  a  polyclonal  antisemm  that  recognizes  both  SRp40  and  SRp55  (Du  et 
al.,  1997)  selectively  precipitated  crosslinked  proteins  of  the  expected  size  (data  not  shown).  These 
results  suggest  that,  similar  to  SF2/ASF,  SRp55  and  SRp40  also  interact  directly  with  their 
cognate  in  virro-selected  ESEs. 

Sequences  that  fit  the  consensus  for  in  vitro -selected  ESEs  are  present  in  natural 
exons  and  known  ESEs 

Sequences  identified  by  SELEX  procedures  do  not  necessarily  correspond  to  functional 
elements  that  have  evolved  in  nature  (Irvine  et  al.,  1991).  To  evaluate  the  biological  significance  of 
the  novel  ESE  consensus  sequences  I  identified,  I  analyzed  their  distribution  in  known  sequences 
of  natural  genes.  I  reasoned  that  if  the  short  consensus  motifs  derived  from  the  in  vitro-selected 
ESEs  are  akin  to  natural  ESEs,  they  should  be  present  with  higher  probability  in  regions 
corresponding  to  known  ESEs  than  elsewhere  in  the  exons  or  in  intron  sequences.  The  score 
matrices  derived  for  each  of  the  three  SR  proteins  tested  were  used  to  search  genes  or  exons  with 
previously  characterized  ESEs.  The  resulting  scores  were  then  plotted  against  the  position  along 
the  exons  or  genes  (Figure  8). 

The  natural  sequence  of  the  mouse  IgM  exon  M2  was  analyzed  first,  since  our  ESEs  were 
selected  in  the  context  of  this  exon,  after  deletion  of  its  natural  ESE.  Remarkably,  a  high  density  of 
motifs  with  high-score  matches  to  the  SF2/ASF  and  SRp40  consensus  was  found  within  the  73-nt 
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natural  ESE  mapped  previously  (Watakabe  et  al.,  1993).  In  contrast,  few  matching  sequences  were 
found  in  the  flanking  regions  of  the  exon,  and  most  of  these  had  lower  scores,  correlating  with  the 
lack  of  splicing  upon  deletion  of  the  natural  ESE.  The  distribution  of  motifs  with  high-score 
matches  to  the  SRp55  ESE  consensus  did  not  correlate  with  the  location  of  the  natural  ESE.  The 
SR  protein  specificity  of  the  natural  M2  ESE  was  not  known  from  previous  work,  but  I  have 
determined  that  IgM  minigene  transcripts  comprising  the  natural  ESE  can  function  in  SI 00  extract 
complemented  with  SF2/ASF,  SRp40  or  SRp55  (data  not  shown).  The  high-score  SF2/ASF  and 
SRp40  motifs  are  present  in  clusters,  suggesting  that  multiple  copies  of  these  motifs  are 
particularly  effective  as  ESEs,  or  provide  an  optimal  context.  Indeed,  multimerization  of  short 
repeats  often  results  in  increased  ESE  activity,  both  in  natural  and  synthetic  enhancer  elements 
(Tian  and  Maniatis,  1993;  Tanaka  et  al.,  1994;  Tacke  and  Manley,  1995). 

I  next  analyzed  the  sequence  of  the  last  exon  of  the  bovine  growth  hormone  (bGH)  gene, 
which  contains  a  natural  ESE  previously  mapped  to  a  115-nt  fragment,  which  is  required  for 
splicing  of  the  preceding  intron  (Sun  et  al.,  1993).  The  highest  density  of  sequences  matching  the 
SF2/ASF,  SRp40  and  SRp55  consensus  ESEs  was  found  within  the  115-nt  fragment 
corresponding  to  the  natural  ESE,  compared  to  the  rest  of  the  189-nt  exon.  The  highest  scores  for 
each  of  the  three  SR  protein  motifs  were  all  found  within  the  fragment  with  natural  enhancer 
activity.  Although  the  last  intron  of  the  bGH  pre-mRNA  does  not  splice  in  SI 00  extract  in  the 
presence  of  SR  proteins,  as  it  apparently  requires  additional  factors,  the  ESE  in  the  last  exon  was 
previously  shown  to  bind  SF2/ASF  specifically,  and  this  SR  protein  also  stimulated  bGH  splicing 
in  nuclear  extract  (Sun  et  al.,  1993). 

The  caldesmon  pre-mRNA  is  alternatively  spliced  in  a  tissue-specific  manner  (Humphrey  et 
al.,  1995).  An  alternative  5'  splice  site  within  the  large  exon  5  is  used  in  non-muscle  cells,  which 
also  exclude  exon  6.  In  smooth  muscle,  the  entire  exon  5  is  included  and  spliced  to  exon  6.  A  32- 
nt  repeat  present  in  multiple  copies  within  the  3'  portion  of  exon  5  functions  as  an  ESE  to  enhance 
usage  of  the  upstream  non-muscle-specific  5'  splice  site  (Humphrey  et  al.,  1995).  Our  sequence 
analysis  showed  that  SF2/ASF  and  SRp40  ESE  consensus  sequences  are  highly  enriched  within 
the  3'  portion  of  exon  5,  whereas  SRp55  consensus  sequences  are  found  much  more  frequently 
upstream  of  the  non-muscle-specific  5'  splice  site. 

Female-specific  alternative  splicing  of  the  Drosophila  doublesex  (dsx)  pre-mRNA  involves  six 
13-nt  repeat  elements  (dsxRE)  and  a  purine-rich  element  (PRE)  (Tian  and  Maniatis,  1993;  Lynch 
and  Maniatis,  1994).  These  cw-acting  elements  are  essential  for  splicing  of  a  dsx  pre-mRNA  in 
Hela  cell  nuclear  extract.  UV-crosslinking  analysis  showed  that  the  dsxREs  bind  specifically  to  the 
human  SR  protein  9G8,  whereas  the  PRE  binds  preferentially  to  SF2/ASF  and  probably  other  SR 
proteins  of  similar  size  in  Hela  nuclear  extract  (Lynch  and  Maniatis,  1996).  Consistent  with  these 
results,  our  sequence  analysis  did  not  reveal  any  high-score  motifs  matching  the  SF2/ASF, 
SRp40,  and  SRp55  ESE  consensus  sequences  within  the  dsxRE,  whereas  high-score  matches  to 
the  SF2/ASF  ESE  were  found  within  the  PRE. 

Finally,  I  also  analyzed  the  sequences  of  characterized  ESEs  present  in  exon  5  of  chicken 
cardiac  troponin  T  (Xu  et  al.,  1993),  in  exon  3  of  the  Tat  gene  of  equine  infectious  anemia  virus 
(Gontarek  and  Derse,  1996),  in  late  pre-mRNAs  of  bovine  papilloma  virus  type  1  (Zheng  et  al., 
1996),  in  exon  ED-A  of  human  fibronectin  (Lavigueur  et  al.,  1993;  Caputi  et  al.,  1994),  and  in  the 
exon  downstream  of  the  tat-rev  intron  of  HIV-1  (Amendt  et  al.,  1995;  Staffa  and  Cochrane,  1995). 
In  all  cases,  the  sequence  analysis  was  consistent  with  the  available  data  on  these  natural  ESEs  and 
the  binding  of  SR  proteins,  when  known  (data  not  shown). 

Next  I  used  the  same  score  matrices  to  analyze  the  distribution  of  high-score  motifs  in  human 
exons  versus  introns.  570  intron-containing  genes,  corresponding  to  2634  exons  (431  kb)  and 
2079  introns  (1300  kb),  were  extracted  from  the  ALLSEQ  data  (Burset  and  Guigo,  1996)  and 
analyzed.  I  searched  all  sequences  with  a  score  equal  to  or  greater  than  the  mean  score  of  the 
selected  winner  pool  for  each  SR  protein.  Remarkably,  high-score  motifs  matching  each  of  the 
three  SR  protein  ESE  consensus  sequences  were  found  more  frequently  in  exons  than  in  introns. 
For  SF2/ASF,  the  density  of  high-score  motifs  was  4.3/kb  of  exon  and  2.9/kb  of  intron;  for 
SRp40,  the  corresponding  numbers  were  7.9/kb  of  exon  and  6.8/kb  of  intron;  and  for  SRp55, 
they  were  5.5/kb  of  exon  and  4.9/kb  of  intron.  The  higher  density  of  high-score  motifs  in  exons 
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than  in  introns  is  statistically  significant  because  of  the  large  database  size,  and  the  p-values  for 
these  pairwise  comparisons  were  all  less  than  10 

3.  Discussion 

I  have  developed  a  method  to  identify  exonic  splicing  enhancer  (ESE)  elements  that  can 
function  specifically  with  individual  SR  proteins.  This  was  accomplished  using  SR  protein- 
deficient  HeLa  extracts  complemented  with  individual  SR  proteins,  and  a  pool  of  pre-mRNAs 
derived  from  mouse  IgM,  whose  natural  ESE  in  exon  M2  was  replaced  by  a  20-nt  segment  of 
random  sequence.  I  have  successfully  carried  out  this  procedure  with  three  human  SR  proteins  - 
SF2/ASF,  SRp40,  and  SRp55  -  and  have  identified  three  novel  classes  of  ESEs  recognized  and 
activated  by  these  proteins.  These  ESEs  are  functional  and  specific:  in  most  cases,  each  type  of 
ESE  activated  splicing  only  in  response  to  the  SR  protein  in  whose  presence  it  was  selected.  The 
SRp40-selected  ESEs  responded  both  to  SRp40  and  to  SRp55,  indicating  that  some,  though 
probably  not  all  ESEs  recognized  by  SRp40  represent  a  subset  of  ESEs  recognized  by  SRp55. 
UV-crosslinking  and  immunoprecipitation  experiments  suggested  that  these  SR  proteins  interact 
directly  with  their  cognate  ESEs  through  sequence-specific  binding.  Sequence  analysis  revealed 
that  the  motifs  identified  by  the  selection  for  function  are  present  at  much  higher  density  in  regions 
corresponding  to  known,  natural  ESEs  than  in  other  exon  regions. 

The  initiS  randomized  pool  of  IgM-derived  substrates  consisted  of  20  fmol  of  pre-mRNA  (~ 
1.2  X  10^*^  molecules),  which  is  large  enough  to  include  all  possible  16-mers  (~4.3  x  10^).  The 
longest  motif  I  identified  was  the  7-nt  consensus  selected  by  SF2/ASF,  indicating  that  the  initial 
random  pool  had  sufficient  complexity.  In  parallel  SELEX  experiments  I  also  employed  a  different 
RNA  pool,  in  which  only  14  positions  within  the  IgM  M2  exon  were  randomized.  Functional 
ESEs  were  also  selected  out  of  that  library  (data  not  shown),  suggesting  that  the  20-mer  library  can 
potentially  encode  most,  if  not  all,  natural  ESEs,  and  that  the  library  size  I  used  was  adequate.  The 
functional  SELEX  procedure  was  performed  for  only  three  rounds.  This  was  deemed  sufficient, 
since  all  the  winner  sequences  tested  proved  to  be  functional.  Additional  rounds  of  selection  would 
be  expected  to  result  in  loss  of  consensus  sequence  information,  as  only  the  most  efficiently 
spliced  RNAs  would  be  recovered. 

My  experiments  confirm  and  extend  two  previous  studies  that  used  functional  selection  from 
random  pools  to  identify  novel  ESEs.  Tian  and  Kole  (1995)  randomized  a  20-nt  region  within  the 
context  of  a  duplicated  exon  in  a  model  p-globin  pre-mRNA.  They  selected  sequences  that 
promoted  inclusion  of  the  duplicated  exon  in  HeLa  nuclear  extract.  The  resulting  ESEs  after  five  or 
seven  selection  cycles  included  both  purine-rich  and  non-purine-rich  motifs  (Tian  and  Kole,  1995). 
A  related  approach  was  used  by  Coulter  et  al.  (1997)  to  identify  ESEs  that  promote  inclusion  of  the 
alternative  exon  5  of  chicken  cardiac  troponin  T.  In  this  case,  the  natural  ESE  was  replaced  by  a 
13-nt  randomized  segment,  and  the  selection  for  splicing  was  carried  out  by  three  rounds  of 
transient  transfection  into  QT35  quail  cells.  The  resulting  ESEs  included  both  purine-rich  elements 
and  a  novel  class  of  AC-rich  elements  (ACEs)(Coulter  et  al.,  1997).  These  pioneering  studies 
could  not  readily  identify  the  factors  responsible  for  ESE  recognition  -  although  SR  proteins  were 
obvious  candidates  -  because  they  relied  on  crude  nuclear  extracts  or  cultured  cells.  In  addition,  the 
novel  ESEs  did  not  fall  into  obvious  consensus  sequences,  most  likely  because  they  represent  a 
complex  collection  of  elements  recognized  by  several  distinct  factors.  I  improved  this  general 
approach  by  performing  the  selections  in  extracts  dependent  upon  addition  of  individual  SR 
proteins,  which  allowed  us  to  identify  functional  ESEs  recognized  and  activated  by  each  SR 
protein,  and  therefore  to  derive  a  corresponding  consensus  sequence.  Our  selections  were  carried 
out  in  a  different  context  from  those  in  the  above  two  studies,  i.e.,  the  last  exon  of  the  IgM  pre- 
mRNA.  Although  ESEs  can  generally  function  in  different  contexts  (Watakabe  et  al.,  1993; 
Coulter  et  al.,  1997),  it  is  possible  that  the  exon  inclusion  assays  can  also  select  relatively  weak 
ESEs,  as  there  is  generally  a  fine  balance  between  exon  inclusion  and  exon  skipping.  On  the  other 
hand,  the  IgM  selection,  which  relies  on  splicing  versus  no-splicing,  may  yield  relatively  potent 
ESEs. 
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Previous  work  also  attempted  to  address  the  specificity  of  SR  proteins  in  ESE  recognition  by 
using  conventional  SELEX  procedures  based  on  selection  for  high-affinity  binding.  The  first  such 
study  was  carried  out  with  recombinant  human  His-tagged  SF2/ASF  or  SC35  lacking  the  RS 
domain  (to  decrease  non-specific  binding  due  to  the  overall  basic  charge)  and  nine  cycles  of 
selection  by  immobilized  metal  affinity  chromatography  and  amphfication  (Tacke  and  Manley, 
1995).  In  the  case  of  SF2/ASF,  the  selection  yielded  purine-rich  motifs.  Some  of  the  selected 
sequences  could  bind  intact  SF2/ASF  and  could  function  as  SF2/ ASF-specific  ESEs  when 
multimerized  and  placed  in  the  context  of  a  chimeric  oc-globin/NCAM  pre-mRNA.  In  the  case  of 
SC35,  although  some  of  the  selected  sequences  bound  the  intact  protein  with  high  affinity,  they 
were  unable  to  function  as  ESEs  in  the  context  of  the  a-globin/NCAM  pre-mRNA.  Similar 
experiments  carried  out  in  our  laboratory  using  purified  human  SF2/ASF  or  SC35,  a  20-nt  random 
region,  and  four  cycles  of  filter  binding  and  amplification  yielded  degenerate  consensus  sequences 
that  were  somewhat  different  from  those  reported  by  others  (A.  Hanamura,  I.  Watakabe,  and  A.R. 
Krainer,  unpublished  data).  High-affinity  RNA-binding  sites  for  human  SRp40  were  recently 
identified  using  His-tagged  protein  pre-incubated  in  SI 00  extract  to  allow  phosphorylation  by 
endogenous  kinases.  This  procedure  yielded  an  SRp40  consensus  high-affinity  binding  site,  which 
could  function  when  multimerized  and  placed  in  the  context  of  a-globinTNCAM  pre-mRNA  (Tacke 
et  al.,  1997).  This  ESE  was  SRp40-specific,  but  also  required  an  additional  fraction  from  nuclear 
extract  to  complement  an  SlOO  extract.  The  Drosophila  ortholog  of  SRp55,  B52,  was  also  used  to 
perform  in  vitro  SELEX  via  nine  rounds  of  filter  binding  and  amplification.  Conserved  sequence 
and  secondary  structure  motifs  were  suggested  to  be  required  for  high-affinity  binding  (Shi  et  al., 
1997). 

Strikingly,  the  earlier  SELEX  experiments  based  on  binding  gave  very  different  results  from 
our  current  results  using  some  of  the  same  SR  proteins  but  with  selection  cycles  based  on  function. 
They  also  differed  from  a  recent  functional  study  of  SC35,  which  yielded  the  consensus  ESE  motif 
TSCNGYY  (T.  Shaal  and  T.  Maniatis,  personsd  communication).  First,  the  consensus  sequences 
obtained  by  these  two  approaches  are  very  different.  The  SF2/ASF  motifs  defined  by  binding 
appear  to  be  a  subgroup  of  those  defined  by  function,  although  they  yield  relatively  low  scores 
when  analyzed  with  our  SF2/ASF  score  matrix.  It  should  be  noted  that  a  purine-rich  composition 
is  not  sufficient  for  function.  Rather,  specific  sequences  are  required,  since  only  some  oligo-purine 
segments  tested  can  function  as  ESEs  in  the  context  of  the  IgM  pre-mRNA  (Tanaka  et  al.,  1994). 
Similarly,  multiple  transition  mutations  in  the  natural  purine-rich  cTNT  ESE  abolished  its  function 
(Ramchatesingh  et  al.,  1995).  Second,  many  of  the  winner  sequences  obtained  by  binding 
protocols  were  not  functional  as  ESEs.  In  contrast,  among  the  winner  sequences  obtained  by  our 
functional  selection  protocol,  many  were  tested  and  all  of  these  were  functional  ESEs.  Third,  the 
complexity  of  the  winner  sequences  identified  by  binding  SELEX  is  much  lower  than  that  of  the 
ESEs  identified  by  functional  SELEX.  As  a  result,  the  consensus  sequences  obtained  from  the 
binding  selection  are  less  degenerate  than  those  I  obtained  through  functional  selection.  This  may 
be  due  in  part  to  the  use  of  more  selection/amplification  cycles  in  some  of  the  binding  SELEX 
experiments.  In  the  natural  situation,  exon  sequences  are  obviously  very  diverse.  Degenerate 
sequence  specificity  is  probably  essential  for  a  limited  number  of  SR  proteins  to  be  able  to 
recognize  a  very  large  number  of  ESE-containing  exons  in  different  genes. 

The  different  results  obtained  by  binding  selection  and  functional  selection  protocols  shed  light 
into  the  mechanisms  of  ESE  function.  The  binding  selection  is  based  on  the  affinity  of  ^A- 
protein  interactions,  and  the  iterative  protocol  is  designed  to  yield  the  binding  sites  with  the  highest 
affinity  for  the  protein  of  interest.  However,  it  appears  that  the  best  binding  sites  are  not 
necessarily  the  best  functional  sites,  and  in  some  cases  a  high  affinity  may  preclude  function.  In 
addition,  optimal  interactions  between  an  SR  protein  and  its  cognate  ESEs  may  require  other 
splicing  components,  as  opposed  to  just  the  purified  protein.  The  binding  protocol  is  carried  out 
with  the  purified  protein,  whereas  the  functional  selection  protocol  is  carried  out  in  the  presence  of 
all  components  required  for  splicing.  There  are  also  technical  reasons  why  iterative  binding 
protocols  may  not  yield  optimal  functional  sites.  The  binding  affinity  and/or  the  specificity  of  the 
binding  may  be  significantly  affected  by  the  idiosyncrasies  of  the  binding  assay  employed.  For 
example,  in  the  most  common  binding  assays,  the  electric  field  and  electrophoresis  buffer,  or 
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interactions  with  the  nitrocellulose  filter,  the  agarose  resin,  or  the  gel  may  influence  binding  (Irvine 
et  al.,  1991).  Indeed,  several  of  the  SF2/ASF  and  SRp40  winners  identified  by  iterative  binding 
failed  to  bind  to  the  cognate  SR  protein  when  analyzed  hy  a  different  binding  assay  (Tacke  and 
Manley,  1995;  Tacke  et  al.,  1997).  Another  contributing  factor  to  the  discrepancy  between  the 
results  obtained  in  binding  and  functional  assays  may  be  that  in  at  least  some  applications  of  the 
former,  truncated  proteins  lacking  the  RS  domain  were  used  (Tacke  and  Manley,  1995).  Although 
the  precise  functions  of  the  RS  domains  are  not  completely  understood,  they  appear  to  be  important 
for  protein-protein  and/or  RNA-protein  interactions  (Wu  and  Maniatis,  1993;  Tacke  et  al.,  1997; 
Xiao  and  Manley,  1997).  Thus,  deletion  of  the  RS  domain  may  affect  the  binding  specificity,  as 
may  an  incorrect  or  incomplete  phosphorylation  state  of  the  domain.  In  addition,  the  use  of  oligo- 
histidine  or  other  tags  may  also  affect  the  binding  specificity.  Finally,  in  the  case  of  the  different 
consensus  sequences  obtained  previously  for  Drosophila  B52  (Shi  et  al.,  1997)  and  in  the  present 
study  for  human  SRp55,  the  binding  specificity  may  have  diverged  considerably  between 
arthropods  and  vertebrates.  For  example,  the  enhancer  complex  formed  on  the  PRE  of  the 
Drosophila  doublesex  pre-mRNA  binds  SRp55/B52  in  Drosophila  Kc  cell  extracts,  but  does  not 
appear  to  bind  human  SRp55  in  HeLa  cell  extracts  (Lynch  and  Maniatis,  1996). 

For  a  given  constitutively  spliced  pre-mRNA  substrate,  such  as  P-globin  pre-mRNA,  many 
SR  proteins  can  individually  support  splicing  in  an  SlOO  complementation  assay  (Fu  et  al.,  1992; 
Zahler  et  al.,  1992;  Screaton  et  al.,  1995;  Zhang  and  Wu,  1996).  This  may  reflect  a  partial  overlap 
in  the  functions  of  the  different  SR  proteins,  or  there  may  be  multiple,  redundant  binding  sites  for 
the  different  SR  proteins  on  certain  pre-mRNAs.  The  striking  phylogenetic  conservation  of  each 
member  of  the  SR  family  argues  against  their  functional  redundancy.  Indeed,  many  examples  of 
substrate-specificity  differences  among  SR  proteins  have  been  described,  such  as  in  alternative 
splicing  or  commitment  assays  (Fu,  1993;  Zahler  et  al.,  1993a;  Screaton  et  al.,  1995;  Wang  and 
Manley,  1995).  The  fact  that  chicken  SF2/ASF  and  Drosophila  SRp55/B52  are  essential  for  cell 
and  embryo  viability,  respectively  (Ring  and  Lis,  1994;  Peng  and  Mount,  1995;  Wang  et  al., 
1996)  argues  that  at  least  some  functions  important  for  development  or  cell  viability  are  uniquely 
carried  out  by  single  SR  proteins  in  vivo. 

The  specific  recognition  of  ESEs  by  SR  proteins  is  well  documented.  A  purine-rich  sequence 
in  the  ED-A  exon  of  the  fibronectin  gene  strongly  enhances  inclusion  of  this  exon.  Gtel  shift  assays 
showed  that  this  purine-rich  sequence  interacts  specifically  with  SR  proteins  (Lavigueur  et  al., 

1993) .  The  last  exon  of  the  bovine  growth  hormone  pre-mRNA  has  a  purine-rich  ESE  required  for 
splicing  of  the  preceding  intron;  SF2/ASF  binds  specifically  to  this  element  in  HeLa  cell  nuclear 
extracts  and  appears  to  be  required  for  its  function  (Sun  et  al.,  1993).  The  dsx  repeat  element 
(dsxRE)  and  purine-rich  element  (PRE)  of  the  doublesex  pre-mRNA  bind  to  different  SR  proteins 
in  both  Hela  and  Drosophila  Kc  cell  extracts.  The  PRE  preferentially  binds  SF2/ASF  in  Hela  cell 
extracts,  and  one  of  the  SRp30  proteins  in  Kc  cell  extracts  (Lynch  and  Maniatis,  1994;  Lynch  and 
Maniatis,  1996).  The  dsxRE  binds  9G8  in  Hela  cell  extracts  and  RBPl/SRp20  in  Kc  cell  extracts 
(Lynch  and  Maniatis,  1994;  Heinrichs  and  Baker,  1995;  Lynch  and  Maniatis,  1996).  The  ESE  of 
the  alternatively  spliced  exon  5  of  avian  cardiac  troponin  T  (cTNT)  pre-mRNA  binds  to  SF2/ASF, 
SRp40,  SRp55  and  SRp75  in  Hela  nuclear  extract.  Purified  SRp40  and  SRp55  can  activate 
splicing  of  exon  5,  but  SC35  cannot  (Ramchatesingh  et  al.,  1995).  The  HIV  tat  exon  3  has  a 
purine-rich  ESE  (Amendt  et  al.,  1995;  Staffa  and  Cochrane,  1995)  and  SF2/ASF  but  not  SC35  can 
commit  a  tat  minigene  transcript  to  the  splicing  pathway  (Fu,  1993;  Ch^dler  et  al.,  1997). 
Assembly  of  an  enhancer  complex  (Enh  complex,  which  resembles  the  commitment  or  E  complex) 
in  vitro  results  in  recmitment  of  different  SR  proteins  depending  upon  the  ESE  sequence,  as 
judged  by  UV-crosslinking  (Staknis  and  Reed,  1994).  For  example,  the  purine-rich  bGH  ESE 
crosslinked  efficiently  to  SRp30  and  at  lower  levels  to  SRp20  and  SRp40  in  the  Enh  complex.  In 
contrast,  the  ESE  of  the  avian  sarcoma  and  leukosis  virus  (ASLV)  crosslinked  efficiently  to  SRp40 
and  poorly  to  SRp20  and  SRp30,  even  though  this  ESE  is  also  purine  rich.  Changing  the  purine- 
rich  sequence  of  the  ASLV  ESE  to  a  non-purine-rich  sequence  that  retained  ESE  activity  resulted  in 
an  increase  in  SRp30  crosslinking  and  a  decrease  in  SRp40  crosslinking  (Staknis  and  Reed, 

1994) . 
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My  data  address  the  molecular  basis  of  the  redundancy  and  specificity  of  SR  proteins.  The 
different  consensus  sequences  of  the  three  types  of  in  vitro-selected  ESEs,  and  theh  different 
responses  to  individual  SR  proteins  provide  an  indication  of  the  specificity  of  SR  proteins  in  ESE 
recognition  and  function.  The  consensus  sequence  of  SF2/ASF-selected  ESEs,  SRSASGA, 
matches  the  sequence  of  most  purine-rich  ESEs  characterized  to  date.  It  is  worthwhile  to  note  that 
this  sequence  is  devoid  of  U  residues.  In  the  HPRT  and  IgM  genes,  the  presence  of  C  residues 
within  the  purine-rich  ESEs  was  compatible  with  enhancer  function,  whereas  changing  the  C 
residues  to  U  residues  abolished  their  enhancer  activity  (Tanaka  et  al.,  1994).  The  SRp55-selected 
winners  also  had  a  reduced  U  content,  and  I  suspect  that  a  low  U  composition  contributes  to  the 
information  content  that  defines  ESEs  recognized  by  these  SR  proteins. 

The  SRp40-selected  ESEs  share  a  relatively  short  consensus  sequence,  ACDGS.  They  could 
be  activated  by  SRp55,  but  not  by  SF2/ASF.  The  fact  that  SRp55  could  activate  SRp40-selected 
ESEs  suggests  that  these  two  SR  proteins,  which  are  closely  related  in  domain  structure, 
unmodified  molecular  mass  (31.2  kDa  for  SRp40,  39.6  kDa  for  SRp55),  and  sequence  (65% 
identity)  (Screaton  et  al.,  1995),  also  have  some  functional  overlap.  It  is  unlikely  that  the 
sequences  selected  by  SRp40  fortuitously  comprise  a  distinct  SRp55  recognition  site,  but  not  an 
SF2/ASF  site,  since  all  of  the  seven  independent  SRp40  ESEs  tested  had  similar  properties.  Using 
the  SF2/ASF  score  matrix  to  search  the  seven  examined  SRp40-selected  ESEs,  most  of  them  had  a 
score  lower  than  the  minimum  score  of  the  SF2/ ASF-selected  ESEs,  which  could  explain  why 
SRp40-selected  ESEs  were  not  activated  by  SF2/ASF.  I  do  not  know  why  E4,  which  had  a  score 
higher  than  the  average  score  of  SF2/ASF-selected  ESEs,  was  not  activated  by  SF2/ASF. 
However,  it  is  likely  that  the  sequence  context,  e.g.,  in  the  form  of  negative  elements  or  silencers, 
somehow  prevents  activation  of  this  motif  by  SF2/ASF.  A  related  observation  was  the  fact  that 
SRp40  inhibited  splicing  of  some  SF2/ASF-selected  ESEs  even  in  the  presence  of  SF2/ASF.  This 
may  be  due  to  formation  of  inhibitor  complexes  with  SRp40,  such  that  SR  proteins  may  also 
participate  in  exonic  silencer  function,  depending  upon  the  sequence  context.  An  interestmg 
implication  is  that  the  variable  expression  levels  of  these  antagonistic  SR  proteins  may  determine 
the  cell  type-specific  function  of  certain  ESEs. 

I  did  not  observe  any  cooperative  effects  among  the  SR  proteins  tested.  However,  the  fact  that 
most  of  the  ESEs  I  identified  gave  higher  splicing  efficiencies  in  nuclear  extract  than  in  SI 00 
extract  complemented  with  SR  proteins  suggests  that  other  SR  proteins  and/or  additional  spUcing 
factors  may  be  required  for  optimal  ESE  recognition  or  function.  With  other  substrates,  such  as  p- 
globin  or  certain  natural  ESE-dependent  pre-mRNAs,  comparable  splicing  efficiencies  can  be 
obtained  in  the  two  systems.  Still  other  namral  or  synthetic  ESE-dependent  pre-mRNAs  can  only 
splice  in  the  nuclear  extract  (Sun  et  al.,  1993;  Tacke  and  Manley,  1995).  The  ESEs  I  obtained  were 
selected  to  function  in  S 100  extract  plus  an  SR  protein,  and  hence,  it  is  not  surprising  that  at  least 
basal  function  could  be  observed  in  this  complementation  system.  However,  maximal  activity 
appears  to  require  one  or  more  additional  factors  that  may  be  limiting  in  the  SlOO  extract. 

Many  natural  ESEs  have  been  found  in  the  last  several  years.  Most  of  these  well-defined  ESEs 
are  purine  rich,  although  this  nucleotide  composition  may  reflect  an  experimental  bias.  First,  m^y 
of  the  biochemical  studies  were  carried  out  in  Hela  cell  nuclear  extract,  in  which  SF2/ASF,  which 
prefers  purine-rich  sequences,  may  be  the  most  abundant  SR  protein.  Second,  purine-rich  motifs 
may  be  easier  to  find  by  visual  inspection  which,  together  with  the  precedent  of  known  purine-rich 
ESEs,  makes  them  more  likely  to  be  studied  further.  I  have  identified  three  new  degenerate  motifs, 
which  are  not  necessarily  purine  rich.  Significantly,  the  SF2/ASF  and  SRp40  motifs  I  defined 
occur  more  frequently  (and  with  higher  scores)  within  exon  segments  corresponding  to  known 
ESEs  than  elsewhere  in  the  exons.  All  of  the  motifs  also  occur  more  often  in  exons  than  in  introns, 
and  may  thus  contribute  to  defining  exon-intron  boundaries.  These  consensus  sequences  may  be 
useful  for  the  prediction  of  natural  ESEs  in  uncharacterized  exons.  My  data  also  suggest  that  target 
sites  for  multiple  SR  proteins  are  clustered  within  natural  ESEs.  This  may  explain  why  large 
deletions  are  often  required  to  inactivate  natural  ESEs.  The  SRp55  ESE  consensus  motif  did  not 
always  correlate  with  the  location  of  natural  ESEs.  Interestingly,  in  the  caldesmon  gene,  SF2/ASF 
and  SRp40  sites  are  enriched  in  the  3’  portion  of  exon  5  that  is  included  in  smooth  muscle  cells, 
whereas  SRp55  sites  occur  more  frequently  in  the  5'  portion  of  exon  5  that  is  included  in  all  cell 
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types.  Inclusion  of  the  constitutively  spliced  upstream  segment  of  exon  5  may  also  require  a 
functional  ESE,  which  I  would  predict,  on  the  basis  of  the  present  data,  to  be  SRp55-dependent. 
The  differential  recognition  of  the  alternative  5'  splice  sites  associated  with  the  caldesmon  exon  5 
by  different  SR  proteins  may  be  responsible  for  the  proper  developmental  and  tissue-specific 
expression  of  caldesmon  by  alternative  splicing. 
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CONCLUSION 

In  the  past  year  I  have  concentrated  on  the  mechanisms  of  recognition  of  splicing  enhancers, 
which  are  relevant  to  both  conventional  and  non-conventional  intron  splicing.  Three  novel  classes 
of  exonic  splicing  enhancers  (ESEs)  recognized  by  human  SF2/ASF,  SRp40  and  SRp55  have 
been  identified  by  an  iterative  flmctional  selection  procedure.  These  ESEs  are  functional  in  splicing 
and  are  highly  specific.  In  most  cases,only  the  cognate  SR  protein  can  efficiently  recognize  an  ESE 
and  activate  splicing.  An  interesting  exception  is  that  SRp40-selected  ESEs  can  function  with  either 
SRp40  or  SRp55.  UV-crosslinkin^competition  and  immunoprecipitation  experiments  showed  th^ 
SR  proteins  recognize  their  cognate  ESEs  in  nuclear  extract  by  direct  and  specific  binding.  A  motif 
search  algorithm  was  used  to  derive  consensus  sequences  for  ESEs  recognized  by  each  SR 
protein,  and  to  show  that  such  consensus  sequences  occur  at  high  frequencies  in  exonic  regions, 
particularly  those  corresponding  to  naturally  occurring,  mapped  ESEs.  Multiple  high  score  motifs 
were  also  found  in  the  proliferating  cell  nucleolar  antigen  (PI 20)  gene  exons,  including  Aose 
adjacent  to  the  non-conventional  intron  F.  Future  studies  will  focus  on  the  identification  of  splicing 
factors  essential  for  the  function  of  splicing  enhancers,  either  in  the  context  of  conventional  or  non- 
conventional  introns. 
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APPENDICES 
Table  legends 

Table  1.  Summary  of  the  activities  and  specificities  of  three  types  of  in  vitro - 
selected  ESEs. 

The  in  viYro-selected  ESEs  were  tested  for  function  as  part  of  IgM  minigene  pre-mRNAs  in  HeLa 
SlOO  extract  complemented  with  recombinant  SF2/ASF,  SRp40,  or  SRp55.  The  sequences  of  the 
B,  E,  and  C  winner  series  are  given  in  Figures  2, 3  and  4.  ND:  not  determined. 

Figure  legends 

Figure  1.  Procedure  for  randomization  and  selection  of  exon  splicing  enhancers 
(ESEs). 

(A)  The  natural  ESE  in  mouse  IgM  exon  M2  was  replaced  by  a  20-nt  segment  of  random 
sequence,  and  a  library  of  pre-mRNAs  was  constructed  by  overlap-extension  PCR  and  in  vitro 
transcription.  A  sample  of  this  pool,  representing  ~  1.2  X  10*®  pre-mRNA  moleclules  was  then 
spliced  in  vitro  by  complementation  of  an  SlOO  extract  with  individual  recombinant  SR  proteins. 
The  pool  of  spliced  mRNA  products  was  gel  purified  and  the  sequences  corresponding  to  the  ESE 
region  were  rebuilt  into  pre-mRNA  template  molecules  for  a  new  round  of  selection,  or  subcloned 
and  sequenced.  The  sequences  were  analyzed  by  a  motif-search  algorithm  to  identify  common 
patterns.  (B)  Sequence  of  the  M2  exon  of  the  mouse  IgM  gene.  The  sequence  of  the  previously 
mapped  ESE  is  shown  in  uppercase. 

Figure  2.  Sequence  alignment  of  SF2/ASF-specific  ESEs  (A)  and  sequences  from 
the  initial  random  pool  (B). 

A  consensus  motif  was  identified  as  described  in  the  text.  The  sequences  are  aligned  on  the  basis 
of  the  best  fit  to  the  consensus  within  each  sequence.  Nucleotides  in  the  boxed  alignment  tiiat 
match  the  consensus  position  are  shown  white  on  black;  mismatched  nucleotides  are  not  shaded. 
Underlined  nucleotides  are  from  the  constant  region  flanking  the  randomized  segment.  The 
nucleotide  composition  of  the  randomized  segment  in  the  selected  pool  is  provided  in  the  lower  left 
comer.  S  =  G  or  C;  R  =  A  or  G. 

Figure  3.  Sequence  alignment  of  SRp40-specific  ESEs. 

D  =  A,  G  or  U.  See  legend  of  Figure  2  for  details. 

Figure  4.  Sequence  alignment  of  SRp55-specific  ESEs. 

K  =  U  or  G;  M  =  A  or  C.  See  legend  of  Figure  2  for  details. 

Figure  5.  The  selected  sequences  are  functional  SR  protein-dependent  ESEs. 

(A)  In  vitro  splicing  of  pre-mRNAs  containing  the  SF2/ASF-selected  winner  sequences  in  HeLa 
nuclear  extract  (lanes  1, 4,  7,  10,  13,  16,  19,  and  22),  SlOO  extract  alone  (lanes  2,  5,  8,  11,  14, 
17,  20,  and  23)  or  SlOO  complemented  by  20  pmol  of  recombinant  SF2/ASF  (lanes  3,  6,  9,  12, 
15,  18,  21  and  24).  (B)  Splicing  of  SRp40-selected  winner  sequences  in  nuclear  extract  (lanes  1, 
4,  7,  10,  13,  16  and  19),  SlOO  extract  alone  (lanes  2,  5,  8,  11,  14,  17  and  20)  or  SlOO  extract 
complemented  by  20  pmol  of  recombinant  SRp40  (lanes  3, 6,  9, 12,  15,  18  and  21).  (C)  Splicing 
of  SRp55-selected  winner  sequences  in  nuclear  extract  (lanes  1,  5,  9,  13,  17,  21,  25,  and  29), 
SlOO  extract  alone  (lanes  2, 6,  10,  14,  18,  22,  26,  and  30),  or  in  SlOO  extract  complemented  by 
10  pmol  of  SRp55  (lanes  3, 7,  11,  15, 19,  23,  27,  and  31)  or  by  20  pmol  of  SRp55  (lanes  4,  8, 
12,  16,  20,  24,  28  and  32).  The  structures  and  mobilities  of  the  precursor,  intermediates,  and 
products  of  splicing  (Watakabe  et  al.,  1993)  are  shown  next  to  each  autoradiogram. 

Figure  6.  SR  protein  specificity  of  in  vitro-selected  ESEs. 
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(A)  SRp40-selected  ESEs  are  inactive  with  SF2/ASF.  Splicing  was  carried  out  in  HcLa  SI 00 
extract  complemented  with  20  pmol  of  SF2/ASF  (lanes  1,  4,  7,  10,  13,  16  and  19),  20  pmol  of 
SRp40  (lanes  2,  5,  8,  11,  14, 17  and  20),  or  20  pmol  of  SF2/ASF  plus  20  pmol  of  SRp40  (lanes 
3,  6,  9,  12,  15,  18  and  21).  (B)  SRp40-selected  ESEs  can  function  in  the  presence  of  SRp55. 
Splicing  W2^  carried  out  in  SlOO  extract  complemented  with  20  pmol  of  SRp55  (lanes  1,  4,  7,  10, 
13,  and  16),  20  pmol  of  SRp40  (lanes  2,  5,  8,  11,  14,  and  17),  or  20  pmol  of  SRp55  plus  20 
pmol  of  SRp40  (lanes  3,  6,  9,  12,  15,  and  18). 

Figure  7.  Specific  binding  of  SF2/ASF  to  an  SF2/ASF-seIected  ESE. 

(A)  UV-crosslinking  competition  binding  assay.  Radiolabeled  exon  M2  RNA  (20  fmol) 
comprising  the  B1  winner  sequence  (BIE)  was  incubated  under  splicing  conditions  in  HeLa 
nuclear  extract.  Subsequent  tJV-crosslinking  and  RNase  digestion  resulted  in  label  transfer 
predominantly  to  two  proteins  of  34  kDa  and  47  kDa.  The  former,  which  binds  specifically,  is 
presumed  to  be  SF2/ASF  (see  below).  Cold  competitor  RNAs  containing  either  the  B1  winner 
insert,  an  SRp55-selected  insert  (C4E),  an  SRp40-selected  insert  (E7E),  or  other  control  sequence 
inserts,  were  present  in  excess,  as  indicated  above  the  autoradiogram.  Lane  1 :  no  competitor;  in  the 
remaining  lanes,  the  indicated  competitors  were  present  in  5-fold  excess  (even  lanes)  or  50-fold 
excess  (odd  lanes)  over  the  labeled  BIE  RNA.  (B)  Immunoprecipitation  of  SF2/ASF  UV- 
crosslinked  to  the  BIE  RNA.  UV-crosslinking  was  carried  out  as  in  panel  A,  lane  1.  5% 
equivalent  of  the  input  was  loaded  directly  (lane  1).  Parallel  reactions  were  incubated  with  a  control 
antibody  (lane  3),  or  with  anti-SF2/ASF  monoclonal  antibody  (lane  2),  and  the  immunoprecipitates 
were  recovered  in  SDS  gel  loading  buffer.  In  both  panels,  the  samples  were  analyzed  by  SDS- 
PAGE  and  autoradiography. 

Figure  8.  Distribution  of  in  vitro -selected  ESE  consensus  sequences  within  exons 
comprising  natural  ESEs. 

Score  matrices  were  built  for  each  class  of  in  vitro-selected  ESE,  according  to  the  frequency  of 
each  nucleotide  at  individual  positions  of  each  consensus  sequence.  The  indicated  natural  exon 
sequences  were  searched  with  the  score  matrix,  and  the  resulting  scores  (y  axis)  were  plotted 
against  the  nucleotide  positions  for  each  exon  (x  axis).  Note  that  the  x  axis  scales  are  different  in 
each  case,  because  of  the  different  exon  sizes.  Graphs  are  shown  for  mouse  IgM  exon  M2,  bovine 
growth  hormone  3'  exon.  Drosophila  doublesex  female-specific  exon,  and  chicken  caldesmon 
exon  5.  High  score  motif  matches  are  shown  by  blue  (SF2/ASF),  red  (SRp40),  and  yellow 
(SRp55)  vertical  bars.  The  green  horizontal  bars  under  the  x  axis  indicate  previously  mapped  ESEs 
or  the  doublesex  purine-rich  element  (PRE).  The  black  horizontal  bars  denote  the  doublesex  repeat 
elements  (dsxREs). 
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gtgaaatgactctcagcatGGAAGGACAGCAGAGACCAAGAGA 

TCCTCCCACAGGGACACTACCTCTGGGCCTGGGATA 

CCTGACTGTATGActagtaaacttattcttacgtctttcctgtgttgccctccag 

cttttatctctgagatggtcttctttctagagtcgacctgc 
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SEQUENCES  OF  INITIAL  RANDOM  POOL 

R1 

UCCUACGGUUGUUACCGGGA 

R2 

AGUGCGGUCACCGGAUGAGC 

R3 

UAUGACGAGCGGGAUCCGGG 

R4 

GUAGGCGUCUGGUGGGGGGG 

R5 

AUUCAGCCUAGUUGGGUGG 

R6 

CGUUAUACCGCGCCUGGGUG 

R7 

UCAGUGGAGGUUGUGGCACU 

R8 

GGGCCAUCGUUGUGGAGAAC 

R9 

UGGGCUCAGGCCGGCCGGUG 

RIO 

CUCCUCGUUUAGGGGGUAGG 

Rll 

GUGGGGUUCCGAUGGGGCCG 

R12 

AUAGCGGAUUACGGGCGGC 

R13 

GGGGAGGAGUUCGUGCUGAG 

R14 

GUCAUUAACGGACACAUGGC 

R15 

GUGAAUAUUGCGAUGUGAG 

R16 

CGUGAGUGAUUUCCACAACA 

R17 

CUUCAAGAUAGAACGUGGCU 

R18  (A13) 

AGACAGCGUGGGCGGGAGUG 

R19 

AGAGACAUCGAGGGACUAGG 

R20 

CACCGCGGUGCCACCUCCAC 

R21 

GAGAGACUGUUUUAGUACAC 

R22 

UGAGGACCAAAAGGGUGAAG 

R23 

UAGGGCGAGUAGUGAUAAUG 

R24 

UUGGCAUGCAGGAUAUGCGG 

R25 

AGUGCCUCGGUCAAACGGGG 

R26 

ACGAUCGGCAUGUCUUGUCG 

R27 

GGGGACGAAGCAAUAUGGGC 

R28 

UCGCAGACCAUCAAAUGCGG 

R29 

AGAUUUGCAGAUCGGUUGGA 

R30 

GAGGGAAGUAGAAAUGGCGC 

G  =  39%  A  = 

21%U  =  21%C  =  19% 
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